Water Resources in Copenhagen during 20th century
This script visualizes the spatial component of the data accompanying the Spring 2021 course on the City: Between Culture and Nature, taught by Mikkel Thelle and Mikkel Høghøj. The course surveys the gradual appearance of private and public bathing facilities, toilets and communal hygienic resources in the city of Copenhagen during the 20th century. By editing elements in this script, you can explore different aspects of past and present hygienic amenities in the capital of Denmark.
Before we start: data wrangling
First load the packages necessary for spatial data visualisation and analysis.
library(sf)
library(tidyverse)
library(spatstat)
library(spatialkernel)
library(googlesheets4)
library(leaflet)Spatial data
Next, load your spatial data - polygons representing the suburbs of Copenhagen.
suburbs <- st_read("data/bydel.shp", options = "ENCODING=WINDOWS-1252") # thank you, Malte, for the options argument which fixed the Danish charsoptions: ENCODING=WINDOWS-1252
Reading layer `bydel' from data source `C:\Users\Adela\Documents\RStudio\1_Teaching\cds-spatial\Week07\data\bydel.shp' using driver `ESRI Shapefile'
Simple feature collection with 10 features and 4 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 12.45305 ymin: 55.61284 xmax: 12.73425 ymax: 55.73271
geographic CRS: WGS 84
Simple feature collection with 6 features and 4 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 12.46344 ymin: 55.63174 xmax: 12.73425 ymax: 55.73271
geographic CRS: WGS 84
id bydel_nr navn areal_m2
5 6 6 Vanløse 6699011
6 5 5 Valby 9234177
7 4 4 Vesterbro-Kongens Enghave 8472602
8 1 1 Indre By 10488466
9 9 9 Amager Øst 9820410
10 2 2 Østerbro 9858740
geometry
5 MULTIPOLYGON (((12.4982 55....
6 MULTIPOLYGON (((12.52434 55...
7 MULTIPOLYGON (((12.54553 55...
8 MULTIPOLYGON (((12.72897 55...
9 MULTIPOLYGON (((12.63082 55...
10 MULTIPOLYGON (((12.59777 55...
[1] 3 7 8 10 6 5 4 1 9 2
Attribute data
Next let’s bring in the attribute data. I read the data from a google sheet where my colleagues and I can edit it. You can load it from there if you have a googlesheets4 package installed, or you can use the read_csv() function to read the wc.csv provided in the data folder
# Uncomment the lines below to read data from GDrive wc <-
# read_sheet('https://docs.google.com/spreadsheets/d/1iFvycp6M6bF8GBkGjA2Yde2yCIhiy5_slAkGF-RUF7w/edit#gid=0',
# col_types = 'cnnnnnnnn') write_csv(wc, 'data/wc.csv')
wc <- read_csv("data/wc.csv")
wc# A tibble: 100 x 9
Suburb suburb_id flats wc_access bath year bath_communal_ct wc_communal_ct
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Indre~ 1 16280 11310 3800 1950 NA NA
2 Chris~ 1 5490 3900 900 1950 NA NA
3 Voldk~ 1 13460 12690 4560 1950 NA NA
4 Øster~ 2 30820 28900 13750 1950 NA NA
5 Indre~ 3 28700 24380 5910 1950 NA NA
6 Ydre ~ 3 21710 20410 5800 1950 NA NA
7 Veste~ 4 25850 23930 3730 1950 NA NA
8 Konge~ 4 6270 6240 4240 1950 NA NA
9 Valby 5 14430 13970 8190 1950 NA NA
10 Viger~ 5 7700 7580 5050 1950 NA NA
# ... with 90 more rows, and 1 more variable: hot_water <dbl>
Spatial resolution adjustment - data aggregation
Data on access to hygienic facilities and other water resources in Copenhagen now looks good and tidy, but its spatial resolution is higher than the provided polygons (as in we have multiple rows that all fit within one suburb id). We therefore use the group_by() function to aggregate the data by id before we continue with any spatial operations. Given that the dataset is in fact a time-series, and each kvarter has a record for a given year or decade, we need to group first by the year and then only by id.
While aggregating the finer scale data into larger units, it is convenient to generate some statistics, such as percentages of flats that have bath and wc and hot water access within each suburb. We do this using the summarize() function below.
wcdata <- wc %>% group_by(year, suburb_id) %>% summarize(flats = sum(flats), bath = sum(bath),
pct_bath = bath/flats * 100, wc_access = sum(wc_access), pct_wc = wc_access/flats *
100, warmH20 = sum(hot_water), pct_wH20 = warmH20/flats * 100, communal_wc = sum(wc_communal_ct),
communal_bath = sum(bath_communal_ct))
wcdata# A tibble: 50 x 11
# Groups: year [5]
year suburb_id flats bath pct_bath wc_access pct_wc warmH20 pct_wH20
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1950 1 35230 9260 26.3 27900 79.2 8530 24.2
2 1950 2 30820 13750 44.6 28900 93.8 10750 34.9
3 1950 3 50410 11710 23.2 44790 88.9 8810 17.5
4 1950 4 32120 7970 24.8 30170 93.9 6110 19.0
5 1950 5 22130 13240 59.8 21550 97.4 10830 48.9
6 1950 6 10260 6780 66.1 10120 98.6 6270 61.1
7 1950 7 27260 14790 54.3 26770 98.2 13280 48.7
8 1950 8 19270 15000 77.8 18980 98.5 14690 76.2
9 1950 9 23960 12470 52.0 22950 95.8 11210 46.8
10 1950 10 18000 9030 50.2 16200 90 7800 43.3
# ... with 40 more rows, and 2 more variables: communal_wc <dbl>,
# communal_bath <dbl>
Join the aggregated attribute data with its spatial representations
Now we can join the data with the spatial polygons
Simple feature collection with 50 features and 14 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 12.45305 ymin: 55.61284 xmax: 12.73425 ymax: 55.73271
geographic CRS: WGS 84
First 10 features:
id bydel_nr navn areal_m2 year flats bath pct_bath wc_access pct_wc
1 1 1 Indre By 10488466 1981 26413 14035 53.13671 22546 85.35948
2 1 1 Indre By 10488466 1950 35230 9260 26.28442 27900 79.19387
3 1 1 Indre By 10488466 1965 32470 12780 39.35941 32450 99.93840
4 1 1 Indre By 10488466 1970 30440 11386 37.40473 22381 73.52497
5 1 1 Indre By 10488466 1955 33490 9740 29.08331 33430 99.82084
warmH20 pct_wH20 communal_wc communal_bath geometry
1 NA NA NA NA MULTIPOLYGON (((12.72897 55...
2 8530 24.21232 NA NA MULTIPOLYGON (((12.72897 55...
3 NA NA 9510 2280 MULTIPOLYGON (((12.72897 55...
4 NA NA NA NA MULTIPOLYGON (((12.72897 55...
5 9130 27.26187 NA NA MULTIPOLYGON (((12.72897 55...
[ reached 'max' / getOption("max.print") -- omitted 5 rows ]
Now that we have a merged spatial dataset with attributes, let’s review what attributes are available for visualisation
[1] "id" "bydel_nr" "navn" "areal_m2"
[5] "year" "flats" "bath" "pct_bath"
[9] "wc_access" "pct_wc" "warmH20" "pct_wH20"
[13] "communal_wc" "communal_bath" "geometry"
There is the suburb polygon data, such as id, bydel_nr, navn and areal_m2, and there is also the attribute data such as year, flats, bath ,etc. This gives us lots of choices for display. Lets put the data in a map.
Plot the data on the map
Let’s start by plotting one year alone, to learn how the map works.
Flats and water resources in 1950
Run the whole chunk below, and once it renders, look at the map. Afterwards, try changing the definition of what is to be displayed on line 116. For example, replace "flats" for some other column, such as "pct_bath", or "wc_access" to see how the map changes. To modify the legend, you can modify line 118 where we describe style. Replace style = "jenks" with "pretty", or "equal", or "quantile". What happens to your classification?
wc1950 <- wc_spatial %>% filter(year == 1950)
library(tmap)
tmap_mode(mode = "plot")
tm_shape(wc1950) + tm_borders(col = "black", lwd = 1) + tm_polygons("flats", id = "navn",
style = "jenks") + tm_legend(legend.position = c("RIGHT", "TOP")) + tm_compass(position = c("RIGHT",
"BOTTOM"), type = "rose", size = 2) + tm_scale_bar(position = c("RIGHT", "BOTTOM"),
breaks = c(0, 2, 4), text.size = 1) + tm_credits(position = c("RIGHT", "BOTTOM"),
text = "Adela Sobotkova, 2021") + tm_layout(main.title = "Copenhagen 1950 situation",
legend.outside = FALSE)Flats through time
Now, that you have mastered visualization of a single year, let’s plot all the years we have available!
tmap_options(limits = c(facets.view = 5)) # we want to view 5 periods
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("flats",
id = "navn", style = "jenks") + tm_layout(main.title = "Copenhagen Flats", legend.outside = TRUE)## Lets look at flats per square kilometer Now that we have a spatial object, we can create new columns, for example utilizing the shape area to calculate the density of flats per sq km.
library(tmap)
tmap_options(limits = c(facets.view = 5)) # we want to view 6 years
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("flat_per_km",
n = 5, style = "jenks") #+
## Access to toilets and baths, per suburb and sq kilometer
Lets calculate the baths and toilets available per square kilometer per each suburb
library(tmap)
tmap_options(limits = c(facets.view = 5)) # we want to view 6 years
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("pct_bath",
id = "navn", style = "pretty", title = "% of flats with <br> access to bath") #+library(tmap)
tmap_options(limits = c(facets.view = 5)) # we want to view 5 periods
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("pct_wc",
id = "navn", style = "pretty", title = "% of flats with <br>access to WC")## You can further recalculate the number of baths per sq kilometer
..or continue with communal resources and warm water
Why not practice and try plotting the flats that have access to communal baths and wc, and or hot water? Create your own map here, following the examples above.
Access OSM data for Copenhagen and retrieve (whatever would be relevant?)
The OpenStreetMap contains free and open spatial data for physical features on the ground, with each features’ type being define using key:value pair tags. Each tag describes a geographic attribute of the feature being shown by that specific node, way or relation.
Use:
osmdata:opq()to define the bounding box of the osm requestosmdata:add_osm_feature()to define the key:value pairs you are looking forosmdata:osmdata_sf()to retrieve the osm data.
library(osmdata)
# Create a bounding box
bb <- suburbs %>% st_transform(4326) %>% st_bbox()
plot(bb)q <- opq(bbox = bb, timeout = 180)
qa <- add_osm_feature(q, key = "amenity", value = "public_bath")
# qb <- add_osm_feature(q, key = 'amenity',value = 'drinking_water')
qc <- add_osm_feature(q, key = "amenity", value = "shower")
qd <- add_osm_feature(q, key = "amenity", value = "toilets")
# qe <- add_osm_feature(q, key = 'amenity',value = 'water_point')
public_bath <- c(osmdata_sf(qa), osmdata_sf(qc), osmdata_sf(qd))Clean up OSM data
Use the following code to clean the results and project them in Danish UTM.
This code:
- removes the duplicated geometries thanks to
osmdata::unique_osmdata(see the documentation for details) - projects into WGC84 UTM32
- keeps the name attribute only
- computes the centroids for the baths stored as polygons
- Eventually, the baths outside our CPH suburbs are removed.
library(osmdata)
bath_uniq <- unique_osmdata(public_bath)
rpoint <- bath_uniq$osm_points %>% filter(!is.na(amenity)) %>% st_transform(32632) %>%
dplyr::select(name)
rpoly <- bath_uniq$osm_polygons %>% st_transform(32632) %>% dplyr::select(name) %>%
st_centroid()
baths_osm <- rbind(rpoly, rpoint)
baths_osm <- st_intersection(baths_osm, st_transform(suburbs, 32632) %>% st_geometry() %>%
st_union())
# transform also historical baths
baths_cph <- wc_spatial %>% st_centroid() %>% st_transform(32632) %>% mutate(radius = sqrt(bath_per_km)) %>%
arrange(desc(bath_per_km))Display two maps side-by-side
Now, let’s display the results in two synchronized mapview maps:
- one with bathing resources in suburbs
- another one with baths extracted from OSM.
- Use the
mapview::syncfunction to display both maps side by side with synchronisation.
library(mapview)
# library(leafsync) library(leaflet)
map_osm <- mapview(baths_osm, map.types = "OpenStreetMap", col.regions = "#940000",
label = as.character(suburbs$name), color = "white", legend = FALSE, layer.name = "Baths in OSM",
homebutton = FALSE, lwd = 0.5)
# test map
mapview(baths_cph[, -3], map.types = "Stamen.TonerLite", cex = "radius", legend = FALSE,
col.regions = "#217844", lwd = 0, alpha = 0.4)map_cph <- mapview(baths_cph[, -3], map.types = "OpenStreetMap", col.regions = "#940000",
color = "white", cex = "bath_per_km", legend = TRUE, layer.name = "Baths per sq km <br>in suburbs from 1970",
homebutton = FALSE, lwd = 0.5)
sync(map_osm, map_cph)Improve the display with some comparable dataset
It might be better to combine the OSM data with the public bathhouse data that we had looked at previously in Leaflet.
We need to
- load the data from googlespreadsheet
- filter out missing coordinates and convert to sf object
- project to WGS84 UTM 32
# baths <- read_sheet("https://docs.google.com/spreadsheets/d/15i17dqdsRYv6tdboZIlxTmhdcaN-JtgySMXIXwb5WfE/edit#gid=0",
# col_types = "ccnnncnnnc")
# write_rds(baths,"data/baths.rds")
baths <- read_rds("data/baths.rds")
names(baths) [1] "BathhouseName" "Coordinates" "Longitude"
[4] "Latitude" "Quality" "AlternativeCoordinates"
[7] "Long" "Lat" "ErrorRadius_m"
[10] "Notes"
hist_bathhouses <- baths %>%
dplyr::select(BathhouseName,Longitude,Latitude,Quality) %>%
filter(!is.na(Longitude)) %>%
st_as_sf(coords=c("Longitude", "Latitude"), crs = 4326)
hist_baths <- st_transform(hist_bathhouses, crs=32632)
#test map
library(mapview)
mapview(hist_baths, map.types = "Stamen.TonerLite",
#cex="radius", legend=FALSE,
col.regions="#217844", lwd=0, alpha=0.4)Now, let’s load this projected historical bathouse object in the synced map so we can compare the locations with OSM data.
library(mapview)
map_osm <- mapview(baths_osm, map.types = "OpenStreetMap",
col.regions = "#940000",
label = as.character(suburbs$name),
color = "white", legend = FALSE, layer.name = "Baths in OSM",
homebutton = FALSE, lwd = 0.5)
map_hist <- mapview(hist_baths,
map.types = "OpenStreetMap",
col.regions = "#940000",
color = "white",
# cex = "bath_per_km",
legend = TRUE,
layer.name = "Public bathhouses, early 20th century",
homebutton = FALSE, lwd = 0.5)
sync(map_osm,map_hist)Lovely two different patterns, showing current public baths and toilets in Copenhagen and historical ones. The city has grown (how much?) and so clearly have the hygienic facilities. In the next section, you can see how we may formally evaluate the similarity of spatial patterning between the historical and current data.
Comparing two point patterns. How do we best do it?
We have two patterns, historical and OSM data. Are they similar or dissimilar? How do the patterns of historical and current public bathhouses compare beyond a quick eyeball evaluation?
Here we might be able to use some statistical functions that contrast nearest neighbor distances or multi-distance clustering across the two groups.
We should first check the nature of data: do both patterns represent completely mapped data rather than sampled data (where the nature of sampling can affect the comparison)? If the former, one could use nearest neighbor, K-function or Monte Carlo reassignment.
For a tutorial on Kcross function, see Manny G’s contribution to this exchange https://gis.stackexchange.com/questions/4484/comparing-two-spatial-point-patterns#4490
Before we try some cross-functions, we need to wrangle
But first we need to recast the baths as ppp object. Note: st_union did not work as expected (it is multiplying the features), and so I did a workaround and combined the baths sf objects. En route I found nd this neat post on unioning using Danish municipalities https://gis.stackexchange.com/questions/278818/fastest-way-to-union-a-set-of-polygons-in-r
library(spatstat)
# Prepare the ppp object
# Rebuild ppp from scratch via a combined sf object
st_coordinates(hist_baths) # 21 coordinates X Y
1 724117.6 6174051
2 725157.1 6175185
3 723296.7 6174805
4 725948.5 6175278
5 725785.8 6180317
6 723559.3 6177700
7 722834.5 6174585
8 720272.8 6173844
9 723678.3 6176934
10 724707.2 6178993
11 722185.4 6176224
12 722122.5 6176588
13 725335.2 6184015
14 725304.9 6180228
15 725785.8 6180317
16 723873.5 6175323
17 724179.9 6175885
18 718230.0 6178817
19 719772.3 6178636
20 721614.8 6178171
21 725785.8 6180317
X Y
1 725850.1 6177703
2 726493.8 6175590
3 726459.3 6175654
4 725095.5 6176950
5 725103.7 6176944
6 724953.2 6177691
7 725034.9 6174979
8 724863.2 6176972
9 725852.9 6175423
10 724030.8 6178529
11 720642.1 6178241
12 724272.1 6178188
13 724093.6 6176268
14 725328.6 6171078
15 725615.2 6175893
16 723943.1 6175166
17 722098.1 6174087
18 724776.9 6176260
19 730341.6 6181001
20 730320.1 6181110
21 730172.2 6181150
22 726178.4 6176088
23 724733.4 6180146
24 721246.1 6174057
25 719828.4 6178706
26 722955.9 6174727
27 723389.2 6176931
28 723352.3 6174642
29 724960.3 6174971
30 724989.1 6175004
31 724958.6 6174972
32 721255.6 6180747
33 728845.7 6174575
34 729144.4 6174067
35 729568.9 6173671
36 725908.7 6177944
37 724345.4 6178825
[ reached getOption("max.print") -- omitted 129 rows ]
combined <- data.frame(rbind(st_coordinates(hist_baths), st_coordinates(baths_osm)))
# Now I am ssigning marks which need to be a factor
combined$name <- factor(c(rep("H", 21), rep("O", 166)))
combined X Y name
X1 724117.6 6174051 H
X2 725157.1 6175185 H
X3 723296.7 6174805 H
X4 725948.5 6175278 H
X5 725785.8 6180317 H
X6 723559.3 6177700 H
X7 722834.5 6174585 H
X8 720272.8 6173844 H
X9 723678.3 6176934 H
X10 724707.2 6178993 H
X11 722185.4 6176224 H
X12 722122.5 6176588 H
X13 725335.2 6184015 H
X14 725304.9 6180228 H
X15 725785.8 6180317 H
X16 723873.5 6175323 H
X17 724179.9 6175885 H
X18 718230.0 6178817 H
X19 719772.3 6178636 H
X20 721614.8 6178171 H
X21 725785.8 6180317 H
X1.1 725850.1 6177703 O
X2.1 726493.8 6175590 O
X3.1 726459.3 6175654 O
X4.1 725095.5 6176950 O
[ reached 'max' / getOption("max.print") -- omitted 162 rows ]
# Create an sf object out of the dataframe
b_c <- st_as_sf(combined, coords = c("X", "Y"), crs = 32632)
# Convert into a marked ppp and confirm by plotting
b_ppp <- as.ppp(b_c)
b_pppMarked planar point pattern: 187 points
Multitype, with levels = H, O
window: rectangle = [718230, 734413.5] x [6169722, 6184015] units
## Nearest Neighbour Cross Function and Simulation We randomly reassign marks (H, O) within the combined point dataset and then calculate nearest neighbor between the randomly replaced marked points. Run the simulation 999 times
Ripley-K cross function
Maybe we should look at the multi-scale approach to the bathhouses. Check out J.Levente’s Ripley K’cross-function blog and tutorial.
# Set intervals for moving window (you don't have to)
rc <- seq(0, 3000, 100)
# Run the Kcross function
kcross <- Kcross(b_ppp, i="H",j="O",
# r=rc,
correction='none')
plot(kcross) How to explain this chart? It seems that the OSM baths cluster around historical baths, or are attracted to them even at distances. Or in other words, the ‘O’ events are closer to ‘H’ events than we would expect under complete spatial randomness. Look at this chart for explanation https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/multi-distance-spatial-cluster-analysis.htm
How do we test for statistical significance? The question here is whether the H and O events are similarly clustered or not? Statistical difference can be tested with MC simulation with random labelling of points as O or H type (keeping original ratios) and computing the same cross K-function. The simulation mean and the established simulation envelopes tell us whether the observed between-type pattern is statistically significant or not.
kmult <- envelope(b_ppp, fun=Kcross,
nsim=100, i="H", j="O",
#r=rc,
correction='none',
simulate=expression(rlabel(b_ppp))) # are the two patterns similarly clustered or dispersed at different scalesGenerating 100 simulations by evaluating expression ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.
Done.
An observed curve within the confidence envelopes means that no matter how we group the points into categories, the pattern we identified in the previous step (by checking on the observed and theoretical values) doesn’t change when randomly assigning events into categories. Here the curve falls outside of the confidence envelopes, meaning that there are differences between the point categories.